Changes in demography and geographic distribution in the weeping pinyon pine (Pinus pinceana) during the Pleistocene

Abstract Climate changes, together with geographical barriers imposed by the Sierra Madre Oriental and the Chihuahuan Desert, have shaped the genetic diversity and spatial distribution of different species in northern Mexico. Pinus pinceana Gordon & Glend. tolerates extremely arid conditions. Northern Mexico became more arid during the Quaternary, modifying ecological communities. Here, we try to identify the processes underlying the demographic history of P. pinceana and characterize its genetic diversity using 3100 SNPs from genotyping by sequencing 90 adult individuals from 10 natural populations covering the species' entire geographic distribution. We inferred its population history and contrasted possible demographic scenarios of divergence that modeled the genetic diversity present in this restricted pinyon pine; in support, the past distribution was reconstructed using climate from the Last Glacial Maximum (LGM, 22 kya). We inferred that P. pinceana diverged into two lineages ~2.49 Ma (95% CI 3.28–1.62), colonizing two regions: the Sierra Madre Oriental (SMO) and the Chihuahuan Desert (ChD). Our results of population genomic analyses reveal the presence of heterozygous SNPs in all populations. In addition, low migration rates across regions are probably related to glacial–interglacial cycles, followed by the gradual aridification of the Chihuahuan Desert during the Holocene.


| INTRODUC TI ON
Chihuahuan Desert (ChD) communities have a long history of high variability in their spatial dimensions, driven primarily by water input, which has fluctuated over time (Noy-Meir, 1973;Zavala-Hurtado & Jiménez, 2020).
Geological and climatic events during the Pleistocene affected the diversity patterns of ChD communities (Scheinvar et al., 2020).
Several phylogeographic studies have reported dynamic scenarios where populations migrated and expanded to the south through the Central Mexican Plateau during the interglacial (15-20 glacial cycles), followed by events of recolonization of the northern ChD (Van Devender & Burgess, 1985). Such is the case for Ephedra compacta (Loera et al., 2017), Agave lechuguilla , and Pinus remota, which retreated to Pleistocene refugia in the Bolsón de Mapimí (Lanner & Van Devender, 1981).
The ChD is a high-elevation desert ranging from 600 to 1675 m a.s.l. and receives more rain than other deserts (235 mm of mean annual precipitation; National Park Service, https://www.nps. gov/im/chdn/ecore gion.htm). The ChD is the third-largest desert in the American continent, considered the most diverse in the Western Hemisphere and one of the most diverse arid regions in the world (Beck & Gibbens, 1999). It also has both a high species diversity and richness of endemism (Rzedowski & Calderón de Rzedowski, 1993).
The ChD landscape maintains different vegetation types, such as grass steppes, xeric shrubs in the intermountain plains and valleys, and open woodlands at higher elevations (Czaja et al., 2014;Van Devender & Burgess, 1985).
The Sierra Madre Oriental (SMO) is a high mountain range (1480-3720 m a.s.l.) dating to the Late Cretaceous and early Tertiary Laramide (80-40 Ma; Eguiluz de Antuñano et al., 2000). With a high climatic diversity due to its complex physiographic heterogeneity and meteorological phenomena (Hernandez-Cerda & Carrasco-Anaya, 2004), the SMO receives moisture from the Gulf of Mexico; while the eastern slopes have tropical and temperate forests, the western slope is much drier with xeric scrubs and pine-oak-juniper woodlands Rzedowski, 1978).
The ChD biota has experienced climate fluctuations throughout geologic time. Some of these fluctuations occurred during the Late Quaternary. Paleoecological reconstructions suggest that a woodland corridor covered the area between the SMO and the Sierra Madre Occidental in Mexico (Lanner & Van Devender, 1981;Metcalfe, 2006). One of several reductions in pinyon-juniper and oak woodland occurred during the last 2.5 M years (Pleistocene), which included intense environmental and climatic changes. During the Pleistocene, 11 climatic cycles of growth and reduction in the polar ice cap occurred in North America, affecting the global climatic conditions and consequently the composition of species associated with this woodland corridor. In particular, during the Late Wisconsin glaciation, the vegetation that covered most of the ChD during the Pleistocene was composed of pinyon pines, junipers, and oaks, which started to decrease in extension ca. 11 kya (Holocene), when the reduction in the paleolakes formed during the interglacial period (~115-117 kya) turned this corridor into a warmer and drier area, resulting in a shift to xeric vegetation (Castiglia & Fawcett, 2006;Czaja et al., 2014;Elias & Van Devender, 1990;Lanner & Van Devender, 1981;Ortega-Ramírez et al., 1998;Ramírez et al., 2003;Van Devender & Burgess, 1985).
Pinus pinceana is an endangered conifer, considered a paleorelict (Perry, 1991;SEMARNAT 059, 2010) that inhabits rocky soils with extreme aridity (Passini, 1985). It is locally restricted to the slopes of ChD in the western slopes of the SMO (1480-3000 m a.s.l. ;Farjon et al., 1997) with a mean precipitation of 300-400 mm over the summer (Perry, 1991). The dominant trees in these communities are Pinus cembroides, Juniperus spp., and Yucca spp. Morphological differences, like differential needle wax cover, have been reported across the distribution of P. pinceana (Martiñón-Martínez et al., 2010;Ortiz-Medrano et al., 2016). Previous population genetic studies have shown that P. pinceana has a high genetic diversity within and high differentiation among populations (Escalante, 2001;Ledig et al., 2001;Molina-Freaner et al., 2001).
In this study, we aim to identify how phylogeography, historical demography, and the climatic transformation during the Quaternary influenced the current genetic variation and geographic distribution of P. pinceana by addressing the following four goals: (1) characterize its genetic diversity using genome-wide single nucleotide polymorphisms (SNPs) data, (2) identify phylogeographic patterns across its natural distribution range, (3) infer its past demographic history during the Pleistocene and Holocene, and (4) reconstruct its potential distribution during the LGM to infer the principal climatic limitations of the species.

| Sampling and genetic diversity characterization
The 10 sampled localities were distributed across the entire geographic range of P. pinceana ( Figure S1). Individuals were chosen over the whole area. Usually, these populations are formed by 50-100 trees and we tried to sample the whole ecological area. Needles from nine healthy, regularly distributed adult trees were sampled. DNA was extracted using the CTAB protocol (Doyle & Doyle, 1987). Samples with a concentration over 100 ng/ml were double-digested with the restriction enzymes Pstl/Mspl, which were tested in silico in the genome of Pinus lambertiana to generate fragments along different regions across the genome. The fragments were single-end sequenced with an Illumina Hi-Seq 2500 at the Wisconsin Biotechnology Center.
The read adapters were removed using TRIMMOMATIC (v0.36; Bolger et al., 2014). Only sequences with a minimum length of 80 bp and a minimum Phred quality score of 30 were retained. The read quality was visualized with FASTQC (v0.11.7; https://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/) and MULTIQC (v1.9; Ewels et al., 2016). High-quality reads were mapped to the P. lambertiana genome assembly (Stevens et al., 2016;v1.5) using the BWA-MEM algorithm (v0.7.10; Li et al., 2009). Unmapped reads were discarded. The MarkDuplicates feature from the Picard package (http://broad insti tute.github.io/picar d/) was used to correct artifacts of PCR duplication. We obtained a set of curated SNPs that were filtered prior to analysis using VCFtools (v0.1.16; Danecek et al., 2011). The filtering process removed SNPs with differences compared with the reference but fixed them in P. pinceana, followed by keeping biallelic alleles with a maximum of 25% maximum missing data and individuals with <0.2 of missing data, and at least 15× of coverage.
To remove putative paralogs, we used HDplot (v0.5-7;McKinney et al., 2017) to filter polymorphisms with H (heterozygosity) larger than 0.6 and D (deviation from even read ratios in heterozygous) outside the range of −10 to +10. These parameters were chosen according to the proportion of heterozygous individuals within a population and allelic ratios within heterozygous individuals. A filter based on linkage disequilibrium (LD) was applied using PLINK v1.9 (with a window size of 50 bp, LD < 0.5, and window shift of 5; Purcell et al., 2007). An additional filter was applied to eliminate polymorphisms out of Hardy-Weinberg equilibrium ( Figure S2). Based on these filtering parameters we expected to get a better representation of neutral loci, reducing the artifact incorporated by the abundance in paralogs and pseudogenes, which are abundant in conifers (Prunier, Verta, & MacKay, 2016), and giving more confidence in the accuracy of variant calls to the coalescent models. We retained a total of 3100 SNPs in 88 individuals.
We estimated the genetic diversity for the 10 populations using observed heterozygosity (H O ), within-population gene diversity (H S ), F I G U R E 1 Geographic locations of the 10 populations sampled of Pinus pinceana analyzed genetically. The biogeographic province that encompasses the Chihuahuan Desert is located inside the sand color area, and the sampled populations from this province are indicated by a black dot; black stars correspond to Chw. The Sierra Madre Oriental is located inside the purple area, and the sampled populations from this area are shown with white dots; white stars correspond to SMOn. and the inbreeding coefficient (F IS ) using the R package Hierfstat (Goudet, 2005). Additionally, DNAsp (v6; Rozas et al., 2017) was used to calculate allelic richness, per site nucleotide diversity (π), and pairwise population differentiation (F ST ; Weir & Cockerham, 1984).

| Phylogeographic and population structure
To estimate the clustering among samples without any prior geographic provenance, we ran a discriminant analysis of principal components (DAPC) with the r package Adegenet (Jombart & Ahmed, 2011). Then, we conducted an ADMIXTURE analysis (v1.3.0; Alexander & Lange, 2011) using the genotypes that were previously converted to ordinary PLINK files (.ped) using PLINK (v1.9;Purcell et al., 2007) to estimate the population structure among individuals for K values from 2 to 10 admixture proportions. We selected the replicate with the lowest log-likelihood value for each value of K.
Then, we chose the value of K with the lowest cross-validation error.
Following the clustering of DAPC, our four scenarios modeled included two scenarios without migration: Divergence, to estimate divergence times between ChD and SMO populations (T DIV ), and effective population sizes (N e ). Contraction, to estimate the divergence times between the ChD (T DIV ) and SMO populations, followed by the independent contraction of both groups during the Holocene. We also modeled two scenarios M, including migrations, based on the former two scenarios with the same parameter ranges but adding independent migration rates among groups ( Figure 2). The Divergence without migration model was set using search ranges for the effective population sizes (N e ) from 100 to 1000 individuals in the SMO and ChD lineages, and T DIV between 275 and 62,500 generations ago. The Contraction model was set using search ranges for the effective population sizes (N e ) of SMO and ChD from 100 to 1000 individuals. T DIV was set to date the divergence event within ChD and SMO between 275 and 62,500 generations ago, and T CON SMO was set to date a contraction event in the SMO group and T CON ChD using the range of 40 and 275 generations ago. Finally, for the two migration models, we used the same range adding between 1 × 10 −9 and 1 × 10 −4 , per base per generation, drawn from a log uniform prior distribution.
For each of the four models, we simulated 100 replicates, 150,000 iterations, and 30 ECM cycles. To select the best scenario, we estimated the Akaike Information Criterion AIC value (Akaike, 1974) using the maximum composite likelihood estimated from the best replicate and compared the models using the weighted Akaike information criterion (wAIC). Finally, we

| Ecological niche modeling and refugia hypotheses
We compiled 37 records verified from Escalante (2001) To construct a potential niche model, we used Maxent (v3.3.3e; Phillips & Dudík, 2008) implemented in the suite Wallace (Kass et al., 2018; Appendix S1), testing linear, quadratic, product, and hinge features and regularization multipliers from 0.5 to 2 with 0.5 intervals. The best fitting model was selected with the ΔAIC (ENMeval implemented by Wallace). The details of the analyses can be found in the R script available in the Appendix S1. We processed the selected model (LQHP Rm = 1.5) by removing the 10th percentile training probability, as described by Liu et al. (2005).
The projection to the Last Glacial Maximum bioclimatic layers from CHELSEA on the PMIP3 data (Krager et al., 2021) was performed using the same parameters for the current distribution. Areas for each model (current and LGM) were estimated by counting pixels with the package raster (Hijmans & van Etten, 2012), where each pixel corresponds to approximately 1 km 2 .

| Genotyping and genetic characterization
The single-end sequences produced an average of 2.8 M reads per individual, which was reduced to 2.54 M reads after quality control. The datasets obtained with GATK produced 605,489 variants.
VCFtools was used to obtain the final processed data with 3100 biallelic SNPs, with a moderate amount of missing data by allele (mean = 0.0912%), and a mean depth per site of 64×.
Globally, the observed heterozygosity (H O ) was 0.0663, the within-population gene diversity (H S ) was 0.0541, and the inbreeding coefficient (F IS ) was −0.2247 ( Table 1)

| Genetic structure and phylogeography pattern
The global F ST value was 0.097 (Table 2). Within SMO, the populations had a mean F ST = 0.0144, compared to 0.012 within the ChD populations. Between biogeographic regions, the mean value was 0.0536.
The discriminant analysis of principal components (DAPC) showed two clear population clusters divided by PC1, describing 17.14% of the total variance while PC2 describes 4.3% of the total variance. PC2 helps to divide the SMO cluster with one subcluster including Tolantongo, San Joaquín, and Maguey Verde and another including La Florida and Nuñez (Figure 3a).
The ADMIXTURE analysis showed that K = 2 supports the clustering in two biogeographic areas. For K = 2, 42 of the 45 samples from SMO were classified 100% in one cluster (Figure 3b). Fortythree of the 45 samples from ChD were classified in the first cluster, while two samples from the northwest population of El Palmito were found with a level of admixture between both groups. The higher values of K tested (3-9) showed sub fragmentation within these two biogeographic areas but with less likelihood.

| Demographic history
We used the entire 3100 SNP dataset to model the population divergence time hypotheses from the demographic simulations. Three of the four scenarios resulted in a divergence time estimated during the Early Pleistocene while the Contraction M scenario estimated the divergence between clusters during the Miocene (Tables S1 and S2).

| Historical distribution projections
The LGM model showed a reduced suitable area compared with the current model ( Figure 5), centered in the northern part of the Sierra

| Phylogeography
We detected a clear geographic structure of genetic diversity of

P. pinceana between the Chihuahuan Desert and the Sierra Madre
Oriental. A similar differentiation has been described in several desert scrub plant species in this area, as we discuss below. This structure is apparently related to climate dynamics during the Quaternary (Castiglia & Fawcett, 2006;Czaja et al., 2014;Ramírez et al., 2003).
These changes highlight the importance of two distinct processes in the Chihuahuan Desert: first, genetic drift due to reductions in effective population sizes in different areas; and second, isolation of populations due to low migration rates.  (Escalante, 2001;Figueroa-Corona, 2012) were substantially higher than the values found here. This is expected because microsatellites have higher mutation rates (Hamblin et al., 2007).
In accordance with previous reports in P. pinceana (Ledig et al., 1986(Ledig et al., , 2001 and in some cases negative F IS averages have been found in some conifers like Cryptomeria japonica var. sinensis, and Pinus patula (Cai et al., 2020;Peláez et al., 2020), while in P. albicaulis, the estimates have been positive (Liu et al., 2016).
We attribute these negative F IS values to two possibilities, first the presence of paralogous SNPs that were not filtered with the ap- The differentiation of the SMO and ChD was attributed to isolation by distance in a previous study based on cpDNA SSR loci (Escalante, 2001). Our results of genetic structure clearly identified two lineages, corresponding to the same two biogeographic regions. Furthermore, these lineages present relatively high levels of genetic differentiation between them (F ST = 0.097), like that typically reported for conifer taxa with wide distribution ranges (Cobo-Simón et al., 2020;Jaramillo-Correa et al., 2020;Peláez et al., 2020).
Nevertheless, the discriminant components analysis shows that SMO and ChD groups have significant levels of admixture between them.
The structured phylogeographic pattern between the ChD and SMO that we detected has been reported in several desert scrub plant species from the ChD (e.g., Sosa et al., 2009;Scheinvar et al., 2017;Vásquez-Cruz & Sosa, 2016). In all cases, a strong phylogeographic structure found within the SMO was correlated with geologic and paleoclimatic changes.

| Demographic history
With a patchy geographic distribution, P. pinceana is not considered an endangered species by the IUCN, but it is considered endangered by Mexican law based on more detailed ecological, geographic, and genetic information (SEMARNAT 059, 2010). Demographic data can help us to apply historical context to this question. The current genetic differentiation patterns of the ChD of the xeric scrubs Ephedra compacta and Agave lechuguilla (Loera et al., 2017;Scheinvar et al., 2017) can be explained by an initial expansion, followed by processes of isolation between lineages. In the case of P. pinceana the isolation is related to the present very reduced zones of secondary contact, and limited migration evidencing the biotic transformation during the Quaternary (Elias & Van Devender, 1990;Lanner & Van Devender, 1981;Van Devender & Burgess, 1985).
The results of the four scenarios that we analyzed give support consistent with changes in climate and the genetic structure found in P. pinceana in the ChD during the Middle Pleistocene. We suggest that this allopatric fragmentation of two lineages, together with an expansion of the xeric scrubs and pine-oak forests after the end of the glacial inter-cycles, was crucial in shaping the present distribution of the genetic diversity in P. pinceana.
Quaternary dynamics in the ChD have been described with geological and biogeographic evidence in the xeric scrub and nonarboreal species in the region. These results agree with the pattern of the geographic distribution of genetic diversity found in this study.
In particular, for Agave lechugilla, Scheinvar et al. (2020) A. striata, and A. stricta (Martínez-Ainsworth, 2013;Scheinvar et al., 2017;Trejo et al., 2016), the rodents Thomomys umbrinus and Perognathus avus, and the turtle Kinosternon avescens (Mathis et al., 2014;Neiswenter & Riddle, 2010;Serb et al., 2001). The estimated migration rates in our study show an asymmetric flux between regions (Table S2). Nevertheless, we still cannot establish a cause for the percentage of admixture in individuals from El Palmito given the low connectivity with nearby populations that was suggested by the ADMIXTURE analysis (Figure 3).

| Genetic diversity in a temporal context
Phylogenetic divergence within Pinus subsection Cembroides is estimated to have occurred during the Miocene (11 Mya;Gernandt et al., 2008;Saladin et al., 2017) or the Oligocene (Jin et al., 2021).
The fossil evidence for pinyon pines records large changes during the Pliocene in their distribution throughout the Central Mexican Plateau (Lanner & Van Devender, 1981;Van Devender & Burgess, 1985).

| Historical distribution
Based on our analyses, we propose that P. pinceana differentiated into two lineages in the Middle Pleistocene (~2.49 Ma; 95% CI: 3.28-1.62), followed by the division of both biogeographic regions (~127.7-~539.2 kya) during the interglacial cycles. The climatic conditions in the habitat of P. pinceana are more suitable at present than they were during the LGM; this is in contrast to what has been described in other conifers like Picea chihuahuana, which has become more restricted due to the disappearance of a suitable habitat making it susceptible to the loss of genetic variability via genetic drift, inbreeding depression and strong selection (Jaramillo-Correa et al., 2006;Quiñonez-Perez et al., 2017).
The recent climatic changes in the ChD, the increment in aridification, and the reduction in water bodies (Czaja et al., 2014) promoted the geographic expansion of P. pinceana, and the phylogeographic patterns inferred from our data, involving a gradual expansion over the ChD since the LGM, in particular to the southern and western part of the distribution. This pattern of expansion has been described in the ChD in other scrub plant species like A. lechuguilla and E. compacta (Loera et al., 2012;Scheinvar et al., 2017). The lack of compatible environmental layers to other points in the past prevents us from exploring the distribution closer to the divergence of the two lineages.
Mazapil and General Cepeda are two of the three most genetically diverse populations but also represent, according to the estimation of the habitat, the most stable region over the climate dynamics since the interglacial cycles. Thus we interpreted that this region was a refuge for P. pinceana. Indeed this region has not been previously described as a refuge for the ChD Loera et al., 2017;Scheinvar et al., 2017Scheinvar et al., , 2020Vásquez-Cruz & Sosa, 2020).

| CON CLUS IONS
The genetic diversity in P. pinceana was modified by dynamics during interglacial cycles in the Pleistocene. The demographic scenarios studied resulted in a ranking of models that were useful in gauging relative support for competing hypotheses.
In particular, the best model involved the divergence into two lineages in the Middle Pleistocene (~2.49 Ma; 95% CI: 3.28-1.62). These lineages later colonized two biogeographic regions, the SMO and the ChD, while interglacial cycles modified the Chihuahuan Desert as the aridification increased and paleolakes shrank. This division probably occurred during the climatic changes of the Pleistocene and was related to the glacial-interglacial cycles.
Thus, the phylogeographic history of P. pinceana is likely explained by climate dynamics that left perceptible marks in the patterns of genetic diversity and structure observed in the species today.

ACK N OWLED G M ENTS
To the memory of Dr. Miguel Angel Soto Arenas, in recognition for the discovery of El Palmito locality. We particularly thank Velia and

CO N FLI C T O F I NTE R E S T
All authors contributed to the manuscript writing and revisions, and declared no conflict of interest.

O PEN R E S E A RCH BA D G E S
This article has earned Open Data and Open Materials badges. Data and materials are available at https://www.ncbi.nlm.nih.gov/biopr oject/ PRJNA 71910 6/.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw reads are available in the SRA NCBI database (https://www.ncbi. nlm.nih.gov/sra/) deposited under the PRJNA719106 BioProject.